feat(scraper): add self-healing `scraper heal` command by meirk-brd · Pull Request #11 · brightdata/cli

meirk-brd · 2026-05-27T11:36:38Z

Summary

Adds bdata scraper heal <collector_id> "<prompt>" — AI self-healing for scrapers. When a scraper drifts (selectors move, a page redesigns), the agent fixes it in place instead of rebuilding, so the saved collector_id keeps working and improves.

One new command, scraper heal — the maintenance twin of scraper create: POST /dca/collectors/{id}/refactor_template → poll refactor_template/progress, reusing the existing async trigger→poll machinery (poll_until, build_ai_trigger_retry, SCRAPER_BODY_HINTS, the 429 backoff).
The agent is the detector. The CLI never guesses a scraper is "broken" (a heal is slow/billable/mutating). The agent inspects run output and decides. So run stays read-only — there is no --heal flag on run.
Closes the loop. On success heal emits a {collector_id, status, completed_steps, prompt, view_url, next_step} envelope (same shape as create), where next_step is a ready-to-run bdata scraper run <id> <url> verify command (--url bakes the real URL in). The intended agent flow is: run → inspect → heal → re-run → verify.
Failure is non-destructive. A failed heal leaves the existing scraper unchanged and still working; the recovery note says so (distinct from create's "half-built collector" wording).
Required <prompt> (≤1000 chars, validated fast); carries over --timeout, --max-retries/--no-retry, -o/--json/--pretty/--legacy-output/--timing/-k.

Built test-first; 25 new tests. Design + plan in docs/superpowers/.

Test Plan

pnpm type-check clean
pnpm build clean
src/__tests__/commands/scraper.test.ts — 131/131 pass (incl. validation, all failure paths, retry forwarding, legacy-output, progress-endpoint URL, exit codes, subcommand wiring)
bdata scraper heal --help shows args/options/examples; bdata scraper --help lists heal
bdata scraper run --help has no --heal option (design invariant)
Live run against a real collector (reviewer, with credentials)

Note: the repo's full suite has 8 pre-existing failures in browser/daemon/discover/scrape tests that also fail on main (0.3.0) — unrelated to this change. Agent-facing docs are in a companion PR on the brightdata-plugin repo (scraper-studio skill: Action 3, api-flow, recipe, common-mistake).

meirk-brd · 2026-05-27T15:46:01Z

Update: self-healing approval gate + `scraper approve`

Live e2e (the reason we held merge) uncovered that the self-healing AI flow is human-in-the-loop: it pauses at status: "pending_answer" / step: "user_approval" and never auto-completes. The prior heal polled that to timeout and reported a misleading error. This update fixes that and adds the agent-driven approval path the engineer's resume_automation_job endpoint enables.

What changed (11 + 1 commits on this branch since the original heal):

extract_progress_status now recognizes pending_answer → an awaiting_approval gate sentinel, so the shared poll stops at the gate instead of timing out.
scraper heal (default) stops at the gate: emits status: "awaiting_approval" with preview_result (sample rows the fix would produce) + a compact diff_summary, and a next_step pointing at the approve command. Exit 0 — the heal succeeded, it just needs a decision.
New scraper approve <collector_id> command (--reject to discard): POST /dca/collectors/{id}/resume_automation_job {message}, polls to done, hands back a scraper run verify next_step. Re-runnable if a heal needs multiple approvals.
scraper heal --auto-approve for the autonomous path (approve + poll to done in one command).
Shared resume_and_poll + emit_heal_terminal seams keep heal and approve consistent (incl. resume_failed vs poll_failed labeling).

Verification:

pnpm type-check, pnpm build clean.
src/__tests__/commands/scraper.test.ts: 152/152 (+21 tests).
Live e2e against the real API: healed a real collector → it stopped at the gate with the preview → scraper approve resumed it via the live resume_automation_job endpoint → polled to done → verify run returned data. The approve → done pipeline is confirmed end-to-end.
Invariants hold: scraper run stays read-only (no --heal/--approve); approve has no retry flags.

The 8 failing tests in the full suite are pre-existing (browser/daemon/discover/scrape) and present on main — unrelated to this change. Agent-facing docs updated in the companion skills PR (Action 4 + resume endpoint + recipe).

meirk-brd added 12 commits May 27, 2026 13:12

types(scraper): add heal request/envelope/opts types

6aa957c

chore(scraper): add heal constants and test imports

aa7a258

feat(scraper): validate_heal_prompt for heal command

b965d3d

feat(scraper): build_refactor_request body builder

c951a59

feat(scraper): build_next_step verify-hint builder

580a258

feat(scraper): build_heal_envelope output shape

2ab4abb

feat(scraper): print_heal_recovery_note (non-destructive)

c655af2

refactor(scraper): tidy heal prompt validation and test object

d4290ca

feat(scraper): handle_heal_scraper orchestration

496ff54

feat(scraper): heal poll-attempt log and format_heal_summary test

f63ebe5

feat(scraper): wire heal subcommand with examples

988d437

test(scraper): harden heal fail-fast, progress-url, and exit assertions

9024979

meirk-brd mentioned this pull request May 27, 2026

docs(scraper-studio): document scraper heal self-healing brightdata/skills#23

Merged

4 tasks

meirk-brd added 12 commits May 27, 2026 17:54

types(scraper): gate fields for heal approval + approve opts

9adf9a5

feat(scraper): recognize pending_answer approval gate in poll

c51d269

feat(scraper): build_diff_summary for approval-gate envelope

96cc0b1

feat(scraper): build_approve_next_step hint builder

1d91500

feat(scraper): heal envelope carries preview + diff on gate

b2fa539

style(scraper): trailing comma on heal envelope gate spread

735760c

feat(scraper): resume_and_poll shared resume+poll helper

e45137c

feat(scraper): heal stops at approval gate + --auto-approve

a91d97e

feat(scraper): handle_approve_scraper resume/reject orchestration

0dc2f2e

fix(scraper): label auto-approve resume failure as resume_failed

6ee2e1d

feat(scraper): wire approve subcommand and heal --auto-approve

c84da43

refactor(scraper): drop dead resume type, tidy approve summary

41bdd1e

docs(readme): document scraper heal and approve commands

e502789

meirk-brd merged commit 5e96df2 into main May 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(scraper): add self-healing `scraper heal` command#11

feat(scraper): add self-healing `scraper heal` command#11
meirk-brd merged 25 commits into
mainfrom
feat/scraper-self-healing

meirk-brd commented May 27, 2026

Uh oh!

meirk-brd commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

meirk-brd commented May 27, 2026

Summary

Test Plan

Uh oh!

meirk-brd commented May 27, 2026

Update: self-healing approval gate + scraper approve

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Update: self-healing approval gate + `scraper approve`